Unsupervised Clustering of Morphologically Related Chinese Words

نویسندگان

Chia-Ling Lee

Ya-Ning Chang

Chao-Lin Liu

Chia-Ying Lee

Jane Yung-jen Hsu

چکیده

Many linguists consider morphological awareness a major factor that affects children’s reading development. A Chinese character embedded in different compound words may carry related but different meanings. For example, “商店(store)”, “商品(commodity)”, “商代(Shang Dynasty)”, and “商朝(Shang Dynasty)” can form two clusters: {“商店”, “商品”} and {“商代”, “商朝”}. In this paper, we aim at unsupervised clustering of a given family of morphologically related Chinese words. Successfully differentiating these words can contribute to both computer assisted Chinese learning and natural language understanding. In Experiment 1, we employed linguistic factors at the word, syntactic, semantic, and contextual levels in aggregated computational linguistics methods to handle the clustering task. In Experiment 2, we recruited adults and children to perform the clustering task. Experimental results indicate that our computational model achieved the same level of performance as children.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantical Clustering of Morphologically Related Chinese Words

A Chinese character embedded in different compound words may carry different meanings. In this paper, we aim at semantic clustering of a given family of morphologically related Chinese words. In Experiment 1, we employed linguistic features at the word, syntactic, semantic, and contextual levels in aggregated computational linguistics methods to handle the clustering task. In Experiment 2, we r...

متن کامل

Semantic Clustering of Morphologically Related Chinese Words

متن کامل

Unsupervised Sense Clustering of Related Chinese Words

Chinese words which share the same character may carry related but different meanings, e.g., “花錢(spend)”, “花費(expend)”, “花園(garden)”, “開花(bloom))”. The semantics of these words form two clusters: {“花錢(spend)”, “花費(expend)”} and {“花園(garden)”, “開花(bloom)”}. In this paper, we aim at unsupervised clustering of a given set of such related Chinese words, where the quality of clustering results is t...

متن کامل

An Unsupervised Approach to Chinese Word Sense Disambiguation Based on Hownet

The research on word sense disambiguation (WSD) has great theoretical and practical significance in many fields of natural language processing (NLP). This paper presents an unsupervised approach to Chinese word sense disambiguation based on Hownet (an electronic Chinese lexical resource). In our approach, contexts that include ambiguous words are converted into vectors by means of a second-orde...

متن کامل

Statistical Stemming for Kannada

Stemming is a process that groups morphologically related words into the same class and is widely used in information retrieval for improving recall rate. Here we study a set of statistical stemmers for Kannada, a resource-poor language with highly inflectional and agglutinative morphology. We compare stemming using simple truncation, clustering and an unsupervised morpheme segmentation algorit...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Unsupervised Clustering of Morphologically Related Chinese Words

نویسندگان

چکیده

منابع مشابه

Semantical Clustering of Morphologically Related Chinese Words

Semantic Clustering of Morphologically Related Chinese Words

Unsupervised Sense Clustering of Related Chinese Words

An Unsupervised Approach to Chinese Word Sense Disambiguation Based on Hownet

Statistical Stemming for Kannada

عنوان ژورنال:

اشتراک گذاری